There are no items in your cart
Add More
Add More
| Item Details | Price | ||
|---|---|---|---|
5.0 (1 ratings)
Created by Soumyadeep Dey
English
This is Volume 2 of Data Engineering course. In this course I will talk about Open Source Data Processing technologies - Spark and Kafka, which are the most used and most popular data processing frameworks for Batch & Stream Processing. In this course you will learn Spark from Level 100 to Level 400 with real-life hands on and projects. You will get introduced to Data Lake on AWS (that is S3) & Data Lakehouse using Apache Iceberg.
AWS will be used as the hosting platform and I will talk about AWS Services - EMR, S3 and MSK. I will cover Databricks as Spark hosting platform. I will also show you Spark integration with other services like AWS RDS (MySQL or PostgreSQL) and Redshift.
You will get opportunities to do hands-on using large datasets (100 GB - 300 GB or more of data). This course will provide you hands-on exercises that match with real-time scenarios like Spark batch processing, stream processing, performance tuning, streaming ingestion, Window functions, ACID transactions on Iceberg etc.
Some other highlights:
Please provide feedback and suggestions if you want me to add any other topics.
Live sessions (6-8 hrs) every week. Link of the sessions will be shared once you enrol. Recording for the live sessions will be made available to all learners.
Our curriculum is designed to take you from Beginner level to Expert level using production size datasets and production like scenarios for all courses.
Interact and network with like-minded folks from various backgrounds in Live Sessions.
Stuck on something? Discuss it with your peers and the instructors in the inbuilt chat groups.
Each course contains minimum of 8 to 10 projects with min of 150 - 200 GB of datasets. It is highly recommended to complete all the projects to get understanding of real life scenarios.
Flaunt your skills with course certificates. You can showcase the certificates on LinkedIn with a click.